Refined Rate of Channel Polarization 



Toshiyuki Tanaka and Ryuhei Mori 
Graduate School of Informatics, Kyoto University, Kyoto-shi, Kyoto, 606-8501 Japan, 
e-mail: tt@i.kyoto-u. ac.jp, rmori@sys.i.kyoto-u. ac.jp 



Abstract — A rate-dependent upper bound of the best achiev- 
able block error probability of polar codes with successive- 
cancellation decoding is derived. 

I. Introduction 

Channel polarization [1] is a method which allows us to 
construct a family of error-correcting codes, called polar codes. 
Polar codes have been attracting theoretical interest because 
they are capacity achieving for binary-input symmetric mem- 
oryless channels (B-SMCs), which are also achieving sym- 
metric capacity for general binary-input memoryless channels 
(B-MCs), whereas computational complexity of encoding and 
decoding is polynomial in the block length. Soon after the first 
proposal 12, one can find in the literature a number of con- 
tributions regarding channel polarization and polar codes J2j- 

Of particular theoretical interest is analysis of how fast 
the best achievable block error probability P e of polar codes 
decays toward zero as the block length N tends to infinity. 
Arikan UJ has shown that P e tends to zero as N —> oo 
whenever the code rate R is less than the symmetric capacity 
of the underlying channel. The upper bound he obtained is 
proportional to a negative power of N, which means that 
guaranteed speed of the convergence to zero is very slow. 
His result has subsequently been improved by Arikan and 
Telatar j4|, who have obtained a much tighter upper bound, 
which scales as exponential in — N 13 for f3 < 1/2. Both of 
these bounds, however, do not depend on the code rate R. A 
rate-dependent bound is more desirable, since one naturally 
expects a smaller error probability from a smaller code rate, 
which might in turn suggest that the rate-independent bounds 
are not tight. 

In this paper, we present an analysis of the rate of channel 
polarization. The argument basically follows that of Arikan 
and Telatar [4 |, but extends it to obtain rate-dependent bounds 
of the best achievable error probability. 

II. Problem 

Let W : X i-» y be an arbitrary binary-input memoryless 
channel (B-MC) with input alphabet X = {0, 1}, output 
alphabet y, and channel transition probabilities {W(y|x) : 
x € X, y G y}. Let I(W) be the symmetric capacity of 
W, which is defined as the mutual information between the 
input and output of W when the input is uniformly distributed 
over X. It is an upper bound of achievable rates over W with 
codes that use input symbols with equal frequency. Let the 



Bhattacharyya parameter Z{W) of the channel W be defined 



as 



Z(W) = ]T y/W(y\0)W(y\l). 

yey 

It is an upper bound of the maximum-likelihood estimation 
error for a single channel usage. 

Polar codes are constructed on the basis of recursive ap- 
plication of channel combining and splitting operation. In 
this operation, two independent copies of a channel W is 
combined and then split to generate two different channels 
W~ : X -> y 2 and W+ : X -> y 2 x X. The operation, in 
its most basic form, is defined as 

W-(y u y 2 \x{) = ]T \w{y l \{xF) l )W{y 2 \{xF) 2 ), 

x 2 ex 

W+(yi, 2/ 2 , x x \x 2 ) = \w{y 1 \{xF) 1 )W{y 2 \{xF) 2 l (1) 



with 



F = 



1 
1 1 



x = (x u x 2 ). 



It has been shown |[T| that 

Z{W+) = Z(W) 2 , 
Z{W) < Z{W~) < 2Z(W) - Z{Wf 



(2) 



(3) 



In constructing polar codes, we recursively generate channels 
with the channel combining and splitting operation, starting 
with the given channel W, as 

w {w-, w + } ->• {w—, w~ + , w + -, W ++ } 
-> {w — , w— + , w- + -, w- ++ , 

W + —, W + - + , W ++ -, W +++ ] -> ••• , (4) 

where we have adopted the shorthand notation W~~ = 
(W~)-, etc. 

Following Arikan JT], this process of recursive generation 
of channels can be dealt with by introducing a channel-valued 
stochastic process, defined as follows. Let {B\, B 2 , . ■ .} be 
a sequence of independent and identically distributed (i.i.d.) 
Bernoulli random variables with P{B\ = 0) = P(B\ = 1) = 
1/2. Given a channel W, we define a sequence of channel- 
valued random variables {Wo, Wi, . ■ .} as 



W = W, W„+i = 



W~ if B 
W+ if B 



n+l 
n+l 



(5) 



We also define a real-valued random process {Zq, Z\, . . .} 
via Z n = Z(W n ). 



Conceptually, a polar code is constructed by picking up 
channels with good quality, among TV = 2™ realizations of 
W n . We use these selected channels for transmitting data, 
while some predetermined values are transmitted over the 
remaining unselected channels. Thus, the rate of the resulting 
polar code is R if we pick up NR channels. We are interested 
in performance of polar codes under successive cancellation 
(SC) decoding, which is defined in [1|. Let P e (N, R) be 
the best achievable block error probability of polar codes 
of block length N and rate R under successive cancellation 
decoding. Since the Bhattacharyya parameter Z(W) serves 
as an upper bound of bit error probability in each step of 
successive cancellation decoding, an inequality of the form 

P(Z n <-/)>R (6) 

implies P e (N, R) < NR~f via union bound. 

It has been proved U that, for any R < I(W), there exists 
a polar code with block length N — 2™, whose block error 
probability P e (N, R) is arbitrarily close to 0. The proof is 
based on showing the condition © to hold for 7 s o(N^ 1 ). 

III. Main Result 

The main contribution of this paper is to prove the following 
theorem, which improves the results in (TJ, [0], giving a rate- 
dependent upper bound of the block error probability. 

Theorem 1: Let W be any B-MC with I(W) > 0. Let 
R £ (0,I(W)) be fixed. Then, for N = 2™, n £ N, the 
best achievable block error probability P e (N, R) satisfies, 

F e (iV = 2»,i?)= (2- 2( " +t ^ )/2 ), (7) 

for any t satisfying t < Q^ 1 (R/I(W)), where Q(x) = 
f™e- u2 / 2 du/V2^. 

IV. Proof 

A. Outline 

The proof basically follows that of Ankan and Telatar J4] 
but extends it in several respects. It consists of three stages, 
which we call polarization, concentration, and bootstrapping, 
respectively. In the first stage, it will be argued that realizations 
of Z n are in (0, £] for some £ > with probability arbitrarily 
close to I(W) as n becomes large. This corresponds to 
the fundamental result of channel polarization (TJ. In the 
second stage, concentration will be argued, that is, again 
with probability arbitrarily close to I(W) as n gets large, 
realizations of Z n are in (0, /„] for some /„ approaching 
zero exponentially in n. In the last stage, we will argue that, 
once Z m for some m enters the interval (0, /„], the sequence 
Z m+ i, . . . , Z„ is rapidly decreasing with overwhelming prob- 
ability, which is a refinement of the "bootstrapping argument" 
of |4|. The last stage is further divided into two substages, the 
rate-independent bootstrapping stage and the rate-dependent 
bootstrapping stage, the latter of which is crucial in order to 
see dependence on the code rate. 



B. Preliminaries 

For m, n £ N with m < n, define 

n 

S m ,n = ^2 Bi, (8) 
i=m-\-l 

which follows a binomial distribution, since it is a sum of i.i.d. 
Bernoulli random variables. 

Definition 1: For a fixed 7 £ [0, 1], let G m ,n{l) be the 
event defined by 

Gm.nil) = {Sm, n > 7(« _ m )}- 

From the law of large numbers, 

lim P(0 m ,„( 7 )) = l (9) 

n— m— ^ 00 

holds if 7 < 1/2. 

C. Random Process 

We now consider the random process X n £ [0, 1] satisfying 
the following properties. 

1) X n converges to a random variable almost surely. 

2) Conditional on X n , if X n 7^ 0, 1, 

( £ [X n , qX n ] if B n+ i = 1 
n+1 { = Xl if B n+1 = 

for a constant q > 1, and X n+ i = X n with probability 

1 for X n = or 1. 
Equation ^ implies that the random process Z n satisfies the 
above properties with q = 2. It should be noted that the 
properties 1 and 2 imply P(X QO £ {0,1}) = 1. 

Definition 2: For ( £ (0, 1) and n £ N, define an event 
T„(0 with 

7;(C) = {X i <C; v i>n}. 

The following lemma is an immediate consequence of the 
above definition. 

Lemma 1: For any fixed ( £ (0, 1), 

lim P(T n (0) = P(Xoc = 0). 

n— >oc 

D. Concentration 

For large enough n, one can expect that X„ is exponentially 
small in n with probability arbitrarily close to P(Xoc = 0). 
In other words, a P(X 00 = 0) -fraction of realizations of X n 
"concentrates" toward zero. To formalize the above statement, 
we introduce the following definition. 

Definition 3: Let p £ (0, 1) and 8 £ (0, 1/2). The events 
C n (p) and T) n (f3) are defined as 

C n (p) = {X n < p n }, (10) 

V n [fi) = {X n <2- 2 ""}, (11) 

respectively. 

We will first prove that the event C n has a probability arbi- 
trarily close to P(X OQ = 0) as n tends to infinity, on the 
basis of which we will next prove that the event T) n (/3) has a 
probability arbitrarily close to P(X ao = 0) as b-)oo. 



The result for the event C n is proved in the following 
proposition, on the basis of which the result for the event 
T> n is proved in the bootstrapping stage. 

Proposition 1: For an arbitrary fixed p £ (0, 1), let C n (p) 
be the event defined as ( fTOb - Then, 

lim P(C n (p)) = P(X oo = 0). 

n—>oo 

The proof is essentially the same as that for Theorem 2 in HI, 
and is omitted due to space limitations. 

E. Bootstrapping: Rate-Independent Stage 

For some m -C n, once a realization of X m becomes small 
enough, one can assure, with probability very close to 1, that 
samples conditionally generated on the realization of X m will 
converge to zero exponentially fast. This is the basic idea 
leading to the "bootstrapping argument" of [4|. We basically 
follow the same idea. 

The proof regarding the bootstrapping stage is based on a 
consideration of properties of a process {Li}, defined on the 
basis of {Xi} as 



Li = log 2 X i , % = 0, m, 

2Li if Bi + \ = 1 



Li+i — 



L 4 + log 2 q if B l+1 = 



(12) 

i > m (13) 



for a fixed m. The inequality < 2 Li holds on the sample- 
path basis for all i > 0. 

If we fix L m and Sm t n , the largest value of L„ is achieved 
by the sequence {B m+ i, . . . , B n } of (n — m — S myn ) con- 
secutive 0s followed by S m , n consecutive Is. One therefore 
obtains 



L n < 2" 



[L m + (n - m - S m ,n) log 2 q] ■ (14) 



Lemma 2: Fix 7 £ [0, 1] and e > 0, and let p — p(j) be 
such that log 2 p = — (1 — j)(n — m) log 2 q/m — e holds. Then, 
conditional on C m (p(7)) fl Gm,n("f), the inequality 



L n < _2^("- m ) 



em 



holds. 

Proof: Conditional on C m (p) H Gm,n{l), one has, 
from (fl4l . the inequality 

L n < 2 Sm -" [mlog 2i o+(l -7)(n-m) log 2 q] . 

Letting p = p(j) completes the proof. ■ 
Proposition 2: For an arbitrary fixed /3 £ (0, 1/2), let 
T> n {f3) be the event defined in ( fTTT i. Then, 

lim P(V n (P)) =P(X ao =0). 

71— ►OO 

Proq/: Since /3 6 (0, 1/2), there exists (7, a) £ 
(0, 1/2) x (0, 1) satisfying the condition 7(1 — a) = (e.g., 
letting 7 = (1 + 2£)/4 and a = (1 - 2/3)/(l + 2/3) satisfies 
the condition). We take m = an in Lemma |2] and let {L] 1 } 
denote the process defined by (fT2l and ( fT3l with m = an. 
Then, for any e > 0, one obtains by applying Lemma [2] 



that, conditional on the event C a n(p('j)) (^Gan,n(^) with ^0(7) 
defined in Lemma [2] the inequality 

holds, which in turn implies 

{x n < 2 - 27(1 ^ ) "^»} D c an (p(7))ng m ,„( 7 ). (15) 

For any n > (ea) _1 , /3n < 7(1 — a)n + log 2 eon holds, so 
that one obtains 



V 



n((3) D { 



X n < Y 



(16) 



From (TT3T > and ( fToT l, as well as the independence of C an {p{^)) 
and t/ a „,„(7), one consequently has 

P(T> n (P)) > P(G a n,nh))P(C an (p(j))). (17) 

Hence, using (O and Proposition [T] 

lim P(V n {p)) > lim P(^„,„(7))P(Ca„(p(7))) 

n— >-oo n— >oo 

= P(X oo =0). (18) 



Bootstrapping: Rate-Dependent Stage 

So far, our treatment of the random variable 5 m „ is 
restricted to that within regimes of the law of large numbers. In 
order to obtain a rate-dependent bound, we have to go further 
and treat SVn, n within regimes of the central limit theorem. 

Definition 4: For t £ M. and for a function f(n) = o(y/n), 
the event T-L m ,n(t) is defined as 

U min (t) = <S m .n > -{u - m + t\J n - m) + f(n - m) \ . 

Noting that the random variable S m , n is a sum of (n—m) i.i.d. 
Bernoulli random variables, and that the mean and the variance 
are (n — m)/2 and (n — m)/A, respectively, the following 
lemma is a direct consequence of the central limit theorem. 
Lemma 3: Let m < n. Then, for any t £ M., 

lim P(U m , n (t))=Q{t). 

n—m—> 00 

Proposition 3: For an arbitrary function f(n) = o(y/n). 



liminf P (X n < 2-2<" +t ^ )/2+/( " )N ) > Q( t )p(X 00 = 0). 

Proof: For a fixed f3 € (0, 1/2), we take m = 4 log 2 n in 

Lemma |2] and let {L\ } denote the process defined by (fT2l 
and ( fT3] > with this choice of m. 

Conditional on the event D m ((3), one obtains, from ( fT4] i. 
the inequality 



L (2) < 2 S m , „ [_ 2 0m + {n _ m _ Sm , „) log 2 g] 



(20) 



Let W m .n(t) be the event defined in Definition [4] for a fixed 
t e I and for an arbitrarily chosen function f(n) — o(y/n). 



Conditional on V m (f3)r\'H m . n (t), is bounded from above any fixed m and for an arbitrary function f(k) = o(y/k), one 
as has 

L {2) < 2 !(„_ m ) + i t VH=^+/(r l -m) PiX rn+k <2 2 X m \ 



,Pm , l n-m- ty/n-m 



2Pm+ _v /(n-m) bg 3 g 



P log 2 \og 2 (l/X m+k ) < - + *± + f(k) 



x„ 



which implies that there exists no such that for all n > no, = Q(t) + o(l) (26) 

the condition 

as k — > oo. For any fixed o € (0, 1), and m > 

X n < 2 - a *C— Hl^/C— , | d Pm(/3) n ^ n(t) f ^ < 2 _ 2( „ + ^ )/2+/( „ 

(21) 

is satisfied. From this observation, as well as the independence < Hmsup P [X m+ k < 2~ 

of V m ((3) and T-L m ,n(t), one has 



ft- 



P [X n < 2 - a if— Hl*^^-. 

>P(£> ro (/3))P(H m ,„(t)). (22) 

Thus, 



< lim sup <^ P X TO+fc < 2 



X m < S 

x P(X m < 5) + P I X m+fe < 1 X m >S]\. (27) 



From Fatou's lemma, 



/ lr ul) , ^, \ limsupP I m+ik < I m > <5 < ? X x < I m > 5 

..Wp(x„<2-"'"- + "-» + "-») — V 2 ; V 2 <28) 

> lim Pn>„tB))Pm m „(()) = P(X« = O)Q(i). (23) 0n ,he basis of an<l CS- one ob,ains 

Since m = o(y / n), one can safely absorb possible effects of ' n^-acF ( " ~~ 

m into the function f . This completes the proof. ■ / S 

< Q(t)P(X m <S)+P[X 00 <-, X m >S) . (29) 

G. Converse 



2' 

Since this is true for all m, we conclude that 



In this subsection, we discuss the converse, in which prob- 
abilities that X n takes small values are bounded from above. lim sup P [X n < 2' 



Proposition 4: For an arbitrary function f(n) = o{ s /n) < lim { Q(t)P(X m < 6) + P ( X x < -, X m > 5 ) \ 

m-tao [ \ 2 J j 

limsupP (x n < 2-z (n+t ^ /2+Hn) ) < QQPiX*, = 0) = Q^i^oo - 0), (30) 

holds, where we have used the almost-sure convergence of 
Proof: Fix a process Let {X n } be the random X m to Xoo (property 1 in Sect. HV-Cb . ■ 

process defined as Putting Propositions |3] and @] together, we arrive at the 

following theorem. 

Xi = Xi, for i = 0, • • • , m (24) Theorem 2: For an arbitrary function f(n) = o{i/n) 

Xi = l^J- 1 ' if Bl = 1 , for i > m (25) lim p(x n < 2- 2< " + ^ /!+/W ) = g(t)P(X co = 0). 

[Xi-i, if Bi = V ) 

In applying Theorem [2] to {Z n }, it should be noted that 
The inequality X, > X t holds on the sample-path basis for P{Z 00 = 0) = /(W) holds. Theorem Q] is proved straight- 
all i > 0, which implies forwardly on the basis of Theorem [2] via the argument at the 

end of Sect. M 

P(X n <a)< P{X n < a), 

V. Discussion 

for any a. One also has ^ Extension to Construction with a Larger Matrix 

log 2 log 2 (l/X m+fe ) = S'™ +fe + log 2 log 2 (l/X m ) Polar codes can be constructed on the basis of a matrix 

larger than the 2x2 matrix F in Korada, §a§oglu, and 
The central limit theorem dictates that -^={S — k/2) asymptot- Urbanke [6| have provided a full characterization of whether a 
ically follows the standard Gaussian distribution, so that, for matrix induces channel polarization. They have shown that if 



an £ x £ matrix G is polarizing, then given a symmetric B-MC 

W, 

lim P (z n < 2~ tnP ) = I(W) 

holds for any (3 < E(G), where E(G) is the exponent of 
the matrix G defined in (6). For a non-polarizing matrix, the 
exponent E(G) is zero. 

Our analysis can be extended to obtain a rate-dependent 
result for channel polarization using a larger matrix. The 
extension includes introduction of a sequence {Bi} of i.i.d. 
random variables with P(B\ = k) =l/£ for k = 1, 2, . . . , I. 
Let {Di, £>2, ■ • ■ , Di} be "partial distances" of the matrix G 
defined in |6|. The exponent E(G) is given by the mean of 
the random variable \og e Db^ Let V(G) be the variance of 
the random variable log^D^. Our result in this direction is 
the following: 

lim p(z n < 2 -r^+^\ = Qmw) 

The worst case of polarizing partial distances is given by 
the case where only one of {D\ : D2, ■ ■ ■ , Di} is equal to 
2 and the rest are equal to 1. Since E(G) = log f 2 and 

V(G) = ( log e e 2 ^ (£ — 1) for the worst case, a universal bound 
is obtained as 

lim P(Z n < 2~ lK ' > Q(t)I(W), (32) 

n— >oo \ I 

which can be regarded as a refinement of Theorem 8 in (6). 

B. Minimum Distance and ML Decoding 

We return to the original construction of polar codes on the 
basis of the 2x2 matrix F. Polar codes are linear codes, 
and their generator matrices are obtained from the matrices of 
the form F® n via removal of some rows (corresponding to 
"shortening") and reordering of the remaining rows. Hussami, 
Korada, and Urbanke [7 1 studied the class of linear codes con- 
structed from F® n via shortening, and showed using minimum 
distance analysis that the error probability of such codes is 
u;(2~ 2 ' a ' 1 ) (in the standard Landau notation) for j3 > i. This 
fact means that polar codes with SC decoding achieve the 
best performance as n — > 00 up to the dominant term in the 
double exponent of the error probability. In this subsection, it 
is shown that the minimum distance analysis does not give the 
second dominant term in the double exponent of polar codes 
with SC decoding. This fact implies that SC decoding is not 
necessarily optimal in the second dominant term. 

Proposition 5: For any codes whose generator matrix con- 
sists of 2 n R distinct rows of F® n and any fixed t > Q~ 1 (R), 
the error probability of ML decoding is w(2~ 2t + " ). 

Proof: LetZ" C {0, 1, . . . , 2™ — 1} denote the set of indices 
of rows of F® n chosen to form the generator matrix. The min- 
imum distance of the codes is given by minigi 2 W W, where 
w(i) denotes the Hamming weight of the binary expansion of 
i. Let the minimum distance of a code be 2 d . Since the number 



of rows with weight 2 % of the matrix F® n is ("J , one obtains 
the inequality 

E (fj > 2"i?, (33) 
i=d ^ ' 

or equivalently, 

P(S„ >d)>R, (34) 

where S n is a sum of n i.i.d. Bernoulli random variables with 
probability one half. Let d = n/2+t^/n/2 for any fixed <eR. 
Then, 

P >^>R. (35) 

From the central limit theorem, the left-hand side converges 
to Q[t) as n — > 00. Hence, the condition t < Q _1 (i?) is 
necessary for asymptotic existence of the codes satisfying the 
conditions stated in the Proposition, completing the proof. ■ 

It should be noted that Proposition [5] also means that the 
minimum distance of the codes considered is asymptotically 
at most 2(™+ < 2~ 1 (- R )^ / ">/ 2 . 

The prefactor of the second dominant term in the double 
exponent is Q~ 1 {R) /2 in Proposition|5j which is strictly larger 
than the prefactor Q^ 1 (i?//(W / ))/2 in Theorem Q] whenever 
I(W) < 1. One can argue that it might be due to the channel- 
independent nature of the analysis leading to Proposition [5] 
which is reflected in the absence of the channel W in the 
result. In any case, whether polar codes with SC decoding 
are optimal in terms of the double exponent up to the second 
dominant term is an open problem, and thus needs further 
investigation. 

VI. Conclusion 

We have derived a rate-dependent upper bound of the 
best achievable block error probability of polar codes with 
successive cancellation decoding. The derivation is based on 
that of the previous rate-independent results HJ, dU, which 
discusses channel polarization in regimes of the law of large 
numbers, extending it to regimes of the central limit theorem. 

We would like to mention that the argument given in this 
paper can also be applied to the problem of lossy source coding 
discussed in Q. 
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